Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 434 | 444 |
| Missing cells (%) | 8.1% | 8.3% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High Correlation |
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High Correlation |
Age has 86 (19.3%) missing values | Age has 95 (21.3%) missing values | Missing |
Cabin has 346 (77.6%) missing values | Cabin has 349 (78.3%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 317 (71.1%) zeros | SibSp has 308 (69.1%) zeros | Zeros |
Parch has 343 (76.9%) zeros | Parch has 346 (77.6%) zeros | Zeros |
Fare has 10 (2.2%) zeros | Fare has 5 (1.1%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-08-01 08:59:41.769583 | 2023-08-01 08:59:47.057435 |
| Analysis finished | 2023-08-01 08:59:47.055835 | 2023-08-01 08:59:52.116390 |
| Duration | 5.29 seconds | 5.06 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 446.98655 | 444.23318 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 2 | 2 |
| Maximum | 890 | 889 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 2 | 2 |
| 5-th percentile | 47.5 | 43.5 |
| Q1 | 239.25 | 229.5 |
| median | 441.5 | 448.5 |
| Q3 | 665.75 | 662.75 |
| 95-th percentile | 847.75 | 846.5 |
| Maximum | 890 | 889 |
| Range | 888 | 887 |
| Interquartile range (IQR) | 426.5 | 433.25 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 255.12225 | 258.55853 |
| Coefficient of variation (CV) | 0.57076046 | 0.58203335 |
| Kurtosis | -1.146847 | -1.2119741 |
| Mean | 446.98655 | 444.23318 |
| Median Absolute Deviation (MAD) | 214 | 216.5 |
| Skewness | 0.004981964 | -0.014959797 |
| Sum | 199356 | 198128 |
| Variance | 65087.362 | 66852.512 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 228 | 1 | 0.2% |
| 99 | 1 | 0.2% |
| 237 | 1 | 0.2% |
| 218 | 1 | 0.2% |
| 103 | 1 | 0.2% |
| 719 | 1 | 0.2% |
| 710 | 1 | 0.2% |
| 329 | 1 | 0.2% |
| 158 | 1 | 0.2% |
| 701 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 619 | 1 | 0.2% |
| 81 | 1 | 0.2% |
| 87 | 1 | 0.2% |
| 565 | 1 | 0.2% |
| 588 | 1 | 0.2% |
| 231 | 1 | 0.2% |
| 177 | 1 | 0.2% |
| 843 | 1 | 0.2% |
| 832 | 1 | 0.2% |
| 583 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 14 | 1 | |
| 18 | 1 | |
| 21 | 1 | |
| 23 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 9 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 17 | 1 | |
| 20 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 9 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 17 | 1 | |
| 20 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 14 | 1 | |
| 18 | 1 | |
| 21 | 1 | |
| 23 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 1 |
| 2nd row | 1 | 0 |
| 3rd row | 0 | 1 |
| 4th row | 0 | 0 |
| 5th row | 1 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 2 |
| 2nd row | 1 | 3 |
| 3rd row | 2 | 2 |
| 4th row | 3 | 3 |
| 5th row | 1 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 101 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 101 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 101 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 101 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 101 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 101 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 104 | |
| 2 | 98 |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 65 | 82 |
| Median length | 47 | 50 |
| Mean length | 26.497758 | 27.273543 |
| Min length | 12 | 12 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 11818 | 12164 |
| Distinct characters | 60 | 59 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Lovell, Mr. John Hall ("Henry") | Becker, Miss. Marion Louise |
| 2nd row | Marechal, Mr. Pierre | McEvoy, Mr. Michael |
| 3rd row | Matthews, Mr. William John | Richards, Master. William Rowe |
| 4th row | Van Impe, Miss. Catharina | Murdlin, Mr. Joseph |
| 5th row | Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone) | Osen, Mr. Olaf Elon |
| Value | Count | Frequency (%) |
| mr | 278 | 15.4% |
| miss | 84 | 4.7% |
| mrs | 54 | 3.0% |
| john | 31 | 1.7% |
| william | 29 | 1.6% |
| master | 17 | 0.9% |
| henry | 16 | 0.9% |
| james | 14 | 0.8% |
| charles | 11 | 0.6% |
| thomas | 11 | 0.6% |
| Other values (892) | 1259 |
| Value | Count | Frequency (%) |
| mr | 261 | 14.2% |
| miss | 82 | 4.5% |
| mrs | 75 | 4.1% |
| william | 30 | 1.6% |
| john | 26 | 1.4% |
| master | 18 | 1.0% |
| henry | 17 | 0.9% |
| anna | 14 | 0.8% |
| mary | 14 | 0.8% |
| george | 13 | 0.7% |
| Other values (894) | 1284 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1359 | 11.5% | |
| r | 974 | 8.2% |
| e | 819 | 6.9% |
| a | 797 | 6.7% |
| n | 687 | 5.8% |
| i | 642 | 5.4% |
| s | 614 | 5.2% |
| M | 550 | 4.7% |
| l | 509 | 4.3% |
| o | 488 | 4.1% |
| Other values (50) | 4379 |
| Value | Count | Frequency (%) |
| 1389 | 11.4% | |
| r | 995 | 8.2% |
| a | 879 | 7.2% |
| e | 868 | 7.1% |
| s | 656 | 5.4% |
| n | 651 | 5.4% |
| i | 628 | 5.2% |
| M | 577 | 4.7% |
| l | 531 | 4.4% |
| o | 509 | 4.2% |
| Other values (49) | 4481 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7550 | |
| Uppercase Letter | 1812 | 15.3% |
| Space Separator | 1359 | 11.5% |
| Other Punctuation | 958 | 8.1% |
| Close Punctuation | 67 | 0.6% |
| Open Punctuation | 67 | 0.6% |
| Dash Punctuation | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7814 | |
| Uppercase Letter | 1842 | 15.1% |
| Space Separator | 1389 | 11.4% |
| Other Punctuation | 947 | 7.8% |
| Open Punctuation | 83 | 0.7% |
| Close Punctuation | 83 | 0.7% |
| Dash Punctuation | 6 | < 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1359 |
| Value | Count | Frequency (%) |
| 1389 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 974 | |
| e | 819 | |
| a | 797 | |
| n | 687 | |
| i | 642 | |
| s | 614 | |
| l | 509 | 6.7% |
| o | 488 | 6.5% |
| t | 324 | 4.3% |
| h | 264 | 3.5% |
| Other values (16) | 1432 |
| Value | Count | Frequency (%) |
| r | 995 | |
| a | 879 | |
| e | 868 | |
| s | 656 | |
| n | 651 | |
| i | 628 | |
| l | 531 | 6.8% |
| o | 509 | 6.5% |
| t | 336 | 4.3% |
| h | 262 | 3.4% |
| Other values (16) | 1499 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 550 | |
| A | 140 | 7.7% |
| J | 126 | 7.0% |
| H | 98 | 5.4% |
| S | 91 | 5.0% |
| C | 88 | 4.9% |
| E | 78 | 4.3% |
| L | 73 | 4.0% |
| W | 69 | 3.8% |
| B | 68 | 3.8% |
| Other values (15) | 431 |
| Value | Count | Frequency (%) |
| M | 577 | |
| A | 131 | 7.1% |
| J | 111 | 6.0% |
| H | 106 | 5.8% |
| S | 94 | 5.1% |
| C | 87 | 4.7% |
| E | 83 | 4.5% |
| W | 71 | 3.9% |
| B | 66 | 3.6% |
| R | 64 | 3.5% |
| Other values (15) | 452 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 446 | |
| . | 446 | |
| " | 60 | 6.3% |
| ' | 5 | 0.5% |
| / | 1 | 0.1% |
| Value | Count | Frequency (%) |
| . | 446 | |
| , | 446 | |
| " | 50 | 5.3% |
| ' | 5 | 0.5% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 67 |
| Value | Count | Frequency (%) |
| ) | 83 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 67 |
| Value | Count | Frequency (%) |
| ( | 83 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 5 |
| Value | Count | Frequency (%) |
| - | 6 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9362 | |
| Common | 2456 | 20.8% |
| Value | Count | Frequency (%) |
| Latin | 9656 | |
| Common | 2508 | 20.6% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1359 | ||
| , | 446 | 18.2% |
| . | 446 | 18.2% |
| ) | 67 | 2.7% |
| ( | 67 | 2.7% |
| " | 60 | 2.4% |
| - | 5 | 0.2% |
| ' | 5 | 0.2% |
| / | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1389 | ||
| . | 446 | 17.8% |
| , | 446 | 17.8% |
| ( | 83 | 3.3% |
| ) | 83 | 3.3% |
| " | 50 | 2.0% |
| - | 6 | 0.2% |
| ' | 5 | 0.2% |
Latin
| Value | Count | Frequency (%) |
| r | 974 | 10.4% |
| e | 819 | 8.7% |
| a | 797 | 8.5% |
| n | 687 | 7.3% |
| i | 642 | 6.9% |
| s | 614 | 6.6% |
| M | 550 | 5.9% |
| l | 509 | 5.4% |
| o | 488 | 5.2% |
| t | 324 | 3.5% |
| Other values (41) | 2958 |
| Value | Count | Frequency (%) |
| r | 995 | 10.3% |
| a | 879 | 9.1% |
| e | 868 | 9.0% |
| s | 656 | 6.8% |
| n | 651 | 6.7% |
| i | 628 | 6.5% |
| M | 577 | 6.0% |
| l | 531 | 5.5% |
| o | 509 | 5.3% |
| t | 336 | 3.5% |
| Other values (41) | 3026 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11818 |
| Value | Count | Frequency (%) |
| ASCII | 12164 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1359 | 11.5% | |
| r | 974 | 8.2% |
| e | 819 | 6.9% |
| a | 797 | 6.7% |
| n | 687 | 5.8% |
| i | 642 | 5.4% |
| s | 614 | 5.2% |
| M | 550 | 4.7% |
| l | 509 | 4.3% |
| o | 488 | 4.1% |
| Other values (50) | 4379 |
| Value | Count | Frequency (%) |
| 1389 | 11.4% | |
| r | 995 | 8.2% |
| a | 879 | 7.2% |
| e | 868 | 7.1% |
| s | 656 | 5.4% |
| n | 651 | 5.4% |
| i | 628 | 5.2% |
| M | 577 | 4.7% |
| l | 531 | 4.4% |
| o | 509 | 4.2% |
| Other values (49) | 4481 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.6278027 | 4.7085202 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2064 | 2100 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | female |
| 2nd row | male | male |
| 3rd row | male | male |
| 4th row | female | male |
| 5th row | female | male |
Common Values
| Value | Count | Frequency (%) |
| male | 306 | |
| female | 140 |
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 306 | |
| female | 140 |
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 586 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 140 | 6.8% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2064 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2100 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 586 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 140 | 6.8% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2064 |
| Value | Count | Frequency (%) |
| Latin | 2100 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 586 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 140 | 6.8% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2064 |
| Value | Count | Frequency (%) |
| ASCII | 2100 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 586 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 140 | 6.8% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 75 | 74 |
| Distinct (%) | 20.8% | 21.1% |
| Missing | 86 | 95 |
| Missing (%) | 19.3% | 21.3% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.503 | 29.657407 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| Maximum | 70.5 | 74 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| 5-th percentile | 4 | 5.5 |
| Q1 | 21 | 21 |
| median | 28 | 29 |
| Q3 | 37 | 36 |
| 95-th percentile | 57.05 | 58 |
| Maximum | 70.5 | 74 |
| Range | 70.08 | 73.58 |
| Interquartile range (IQR) | 16 | 15 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.387236 | 14.099824 |
| Coefficient of variation (CV) | 0.48765333 | 0.47542336 |
| Kurtosis | 0.17861103 | 0.44446163 |
| Mean | 29.503 | 29.657407 |
| Median Absolute Deviation (MAD) | 8 | 8 |
| Skewness | 0.37078596 | 0.48797487 |
| Sum | 10621.08 | 10409.75 |
| Variance | 206.99257 | 198.80505 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 28 | 15 | 3.4% |
| 24 | 15 | 3.4% |
| 30 | 13 | 2.9% |
| 19 | 13 | 2.9% |
| 21 | 13 | 2.9% |
| 22 | 13 | 2.9% |
| 25 | 13 | 2.9% |
| 29 | 13 | 2.9% |
| 26 | 11 | 2.5% |
| 36 | 11 | 2.5% |
| Other values (65) | 230 | |
| (Missing) | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 30 | 17 | 3.8% |
| 28 | 16 | 3.6% |
| 18 | 16 | 3.6% |
| 29 | 15 | 3.4% |
| 21 | 13 | 2.9% |
| 36 | 12 | 2.7% |
| 19 | 12 | 2.7% |
| 22 | 12 | 2.7% |
| 35 | 12 | 2.7% |
| 16 | 11 | 2.5% |
| Other values (64) | 215 | |
| (Missing) | 95 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 1 | 5 | |
| 2 | 5 | |
| 3 | 2 | 0.4% |
| 4 | 6 | |
| 5 | 3 | |
| 6 | 2 | 0.4% |
| 7 | 1 | 0.2% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | |
| 0.92 | 1 | 0.2% |
| 1 | 2 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | |
| 0.92 | 1 | 0.2% |
| 1 | 2 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 1 | 5 | |
| 2 | 5 | |
| 3 | 2 | 0.4% |
| 4 | 6 | |
| 5 | 3 | |
| 6 | 2 | 0.4% |
| 7 | 1 | 0.2% |
| 8 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 7 |
| Distinct (%) | 1.3% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.45964126 | 0.50224215 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 8 |
| Zeros | 317 | 308 |
| Zeros (%) | 71.1% | 69.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2.75 | 2 |
| Maximum | 5 | 8 |
| Range | 5 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.92745095 | 1.133254 |
| Coefficient of variation (CV) | 2.0177713 | 2.2563897 |
| Kurtosis | 7.6715909 | 21.972078 |
| Mean | 0.45964126 | 0.50224215 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.6820674 | 4.1914151 |
| Sum | 205 | 224 |
| Variance | 0.86016526 | 1.2842646 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 317 | |
| 1 | 93 | 20.9% |
| 2 | 13 | 2.9% |
| 4 | 11 | 2.5% |
| 3 | 9 | 2.0% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 107 | 24.0% |
| 2 | 12 | 2.7% |
| 3 | 6 | 1.3% |
| 4 | 5 | 1.1% |
| 8 | 5 | 1.1% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 317 | |
| 1 | 93 | 20.9% |
| 2 | 13 | 2.9% |
| 3 | 9 | 2.0% |
| 4 | 11 | 2.5% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 107 | 24.0% |
| 2 | 12 | 2.7% |
| 3 | 6 | 1.3% |
| 4 | 5 | 1.1% |
| 5 | 3 | 0.7% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 107 | 24.0% |
| 2 | 12 | 2.7% |
| 3 | 6 | 1.3% |
| 4 | 5 | 1.1% |
| 5 | 3 | 0.7% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 317 | |
| 1 | 93 | 20.9% |
| 2 | 13 | 2.9% |
| 3 | 9 | 2.0% |
| 4 | 11 | 2.5% |
| 5 | 3 | 0.7% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 6 |
| Distinct (%) | 1.3% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.3632287 | 0.36547085 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 5 |
| Zeros | 343 | 346 |
| Zeros (%) | 76.9% | 77.6% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 5 |
| Range | 5 | 5 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.76313509 | 0.79241949 |
| Coefficient of variation (CV) | 2.1009769 | 2.1682153 |
| Kurtosis | 8.0356805 | 8.8295011 |
| Mean | 0.3632287 | 0.36547085 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.5407068 | 2.6804687 |
| Sum | 162 | 163 |
| Variance | 0.58237517 | 0.62792865 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 343 | |
| 1 | 55 | 12.3% |
| 2 | 43 | 9.6% |
| 5 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 3 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 52 | 11.7% |
| 2 | 40 | 9.0% |
| 3 | 4 | 0.9% |
| 5 | 3 | 0.7% |
| 4 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 343 | |
| 1 | 55 | 12.3% |
| 2 | 43 | 9.6% |
| 3 | 1 | 0.2% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 52 | 11.7% |
| 2 | 40 | 9.0% |
| 3 | 4 | 0.9% |
| 4 | 1 | 0.2% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 346 | |
| 1 | 52 | 11.7% |
| 2 | 40 | 9.0% |
| 3 | 4 | 0.9% |
| 4 | 1 | 0.2% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 343 | |
| 1 | 55 | 12.3% |
| 2 | 43 | 9.6% |
| 3 | 1 | 0.2% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 389 | 383 |
| Distinct (%) | 87.2% | 85.9% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.8161435 | 6.6569507 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 3040 | 2969 |
| Distinct characters | 35 | 32 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 349 | 332 ? |
| Unique (%) | 78.3% | 74.4% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | A/5 21173 | 230136 |
| 2nd row | 11774 | 36568 |
| 3rd row | 28228 | 29106 |
| 4th row | 345773 | A./5. 3235 |
| 5th row | 16966 | 7534 |
| Value | Count | Frequency (%) |
| pc | 30 | 5.3% |
| c.a | 12 | 2.1% |
| 2 | 8 | 1.4% |
| ston/o | 8 | 1.4% |
| soton/o.q | 6 | 1.1% |
| a/5 | 6 | 1.1% |
| soton/oq | 6 | 1.1% |
| w./c | 5 | 0.9% |
| 347082 | 5 | 0.9% |
| 347077 | 4 | 0.7% |
| Other values (407) | 475 |
| Value | Count | Frequency (%) |
| pc | 25 | 4.5% |
| c.a | 13 | 2.3% |
| ca | 8 | 1.4% |
| w./c | 7 | 1.3% |
| a/5 | 7 | 1.3% |
| 2343 | 5 | 0.9% |
| soton/oq | 5 | 0.9% |
| 2 | 4 | 0.7% |
| ston/o | 4 | 0.7% |
| 347082 | 4 | 0.7% |
| Other values (404) | 476 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 341 | |
| 2 | 286 | |
| 7 | 251 | 8.3% |
| 4 | 233 | 7.7% |
| 6 | 214 | 7.0% |
| 0 | 209 | 6.9% |
| 5 | 190 | 6.2% |
| 9 | 166 | 5.5% |
| 8 | 148 | 4.9% |
| Other values (25) | 630 |
| Value | Count | Frequency (%) |
| 3 | 367 | |
| 1 | 319 | |
| 2 | 313 | |
| 4 | 241 | |
| 7 | 229 | 7.7% |
| 0 | 210 | 7.1% |
| 6 | 210 | 7.1% |
| 5 | 188 | 6.3% |
| 9 | 169 | 5.7% |
| 8 | 148 | 5.0% |
| Other values (22) | 575 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2410 | |
| Uppercase Letter | 348 | 11.4% |
| Other Punctuation | 155 | 5.1% |
| Space Separator | 119 | 3.9% |
| Lowercase Letter | 8 | 0.3% |
| Value | Count | Frequency (%) |
| Decimal Number | 2394 | |
| Uppercase Letter | 305 | 10.3% |
| Other Punctuation | 153 | 5.2% |
| Space Separator | 112 | 3.8% |
| Lowercase Letter | 5 | 0.2% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 341 | |
| 2 | 286 | |
| 7 | 251 | |
| 4 | 233 | |
| 6 | 214 | |
| 0 | 209 | |
| 5 | 190 | |
| 9 | 166 | |
| 8 | 148 | 6.1% |
| Value | Count | Frequency (%) |
| 3 | 367 | |
| 1 | 319 | |
| 2 | 313 | |
| 4 | 241 | |
| 7 | 229 | |
| 0 | 210 | |
| 6 | 210 | |
| 5 | 188 | |
| 9 | 169 | |
| 8 | 148 |
Space Separator
| Value | Count | Frequency (%) |
| 119 |
| Value | Count | Frequency (%) |
| 112 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 104 | |
| / | 51 |
| Value | Count | Frequency (%) |
| . | 105 | |
| / | 48 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 74 | |
| O | 62 | |
| P | 47 | |
| S | 43 | |
| A | 34 | |
| N | 24 | 6.9% |
| T | 21 | 6.0% |
| Q | 12 | 3.4% |
| W | 8 | 2.3% |
| I | 6 | 1.7% |
| Other values (6) | 17 | 4.9% |
| Value | Count | Frequency (%) |
| C | 72 | |
| O | 46 | |
| P | 45 | |
| A | 42 | |
| S | 34 | |
| N | 17 | 5.6% |
| T | 16 | 5.2% |
| W | 10 | 3.3% |
| Q | 8 | 2.6% |
| I | 4 | 1.3% |
| Other values (5) | 11 | 3.6% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2 | |
| s | 2 | |
| r | 1 | |
| i | 1 | |
| l | 1 | |
| e | 1 |
| Value | Count | Frequency (%) |
| a | 2 | |
| r | 1 | |
| i | 1 | |
| s | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2684 | |
| Latin | 356 | 11.7% |
| Value | Count | Frequency (%) |
| Common | 2659 | |
| Latin | 310 | 10.4% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 341 | |
| 2 | 286 | |
| 7 | 251 | |
| 4 | 233 | |
| 6 | 214 | |
| 0 | 209 | |
| 5 | 190 | |
| 9 | 166 | |
| 8 | 148 | 5.5% |
| Other values (3) | 274 |
| Value | Count | Frequency (%) |
| 3 | 367 | |
| 1 | 319 | |
| 2 | 313 | |
| 4 | 241 | |
| 7 | 229 | |
| 0 | 210 | |
| 6 | 210 | |
| 5 | 188 | |
| 9 | 169 | |
| 8 | 148 | |
| Other values (3) | 265 |
Latin
| Value | Count | Frequency (%) |
| C | 74 | |
| O | 62 | |
| P | 47 | |
| S | 43 | |
| A | 34 | |
| N | 24 | 6.7% |
| T | 21 | 5.9% |
| Q | 12 | 3.4% |
| W | 8 | 2.2% |
| I | 6 | 1.7% |
| Other values (12) | 25 | 7.0% |
| Value | Count | Frequency (%) |
| C | 72 | |
| O | 46 | |
| P | 45 | |
| A | 42 | |
| S | 34 | |
| N | 17 | 5.5% |
| T | 16 | 5.2% |
| W | 10 | 3.2% |
| Q | 8 | 2.6% |
| I | 4 | 1.3% |
| Other values (9) | 16 | 5.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3040 |
| Value | Count | Frequency (%) |
| ASCII | 2969 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 341 | |
| 2 | 286 | |
| 7 | 251 | 8.3% |
| 4 | 233 | 7.7% |
| 6 | 214 | 7.0% |
| 0 | 209 | 6.9% |
| 5 | 190 | 6.2% |
| 9 | 166 | 5.5% |
| 8 | 148 | 4.9% |
| Other values (25) | 630 |
| Value | Count | Frequency (%) |
| 3 | 367 | |
| 1 | 319 | |
| 2 | 313 | |
| 4 | 241 | |
| 7 | 229 | 7.7% |
| 0 | 210 | 7.1% |
| 6 | 210 | 7.1% |
| 5 | 188 | 6.3% |
| 9 | 169 | 5.7% |
| 8 | 148 | 5.0% |
| Other values (22) | 575 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 176 | 172 |
| Distinct (%) | 39.5% | 38.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 31.335957 | 32.871944 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 10 | 5 |
| Zeros (%) | 2.2% | 1.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.05 | 7.2292 |
| Q1 | 7.8958 | 7.95105 |
| median | 13 | 14.4583 |
| Q3 | 30 | 30.5 |
| 95-th percentile | 112.67708 | 112.67708 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 22.1042 | 22.54895 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 52.374575 | 55.376185 |
| Coefficient of variation (CV) | 1.671389 | 1.6846033 |
| Kurtosis | 35.709632 | 38.663772 |
| Mean | 31.335957 | 32.871944 |
| Median Absolute Deviation (MAD) | 5.7729 | 6.7083 |
| Skewness | 5.1115104 | 5.383644 |
| Sum | 13975.837 | 14660.887 |
| Variance | 2743.0962 | 3066.5219 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 13 | 22 | 4.9% |
| 7.8958 | 21 | 4.7% |
| 8.05 | 21 | 4.7% |
| 26 | 18 | 4.0% |
| 7.75 | 15 | 3.4% |
| 10.5 | 13 | 2.9% |
| 26.55 | 11 | 2.5% |
| 0 | 10 | 2.2% |
| 7.925 | 8 | 1.8% |
| 7.25 | 7 | 1.6% |
| Other values (166) | 300 |
| Value | Count | Frequency (%) |
| 8.05 | 27 | 6.1% |
| 13 | 22 | 4.9% |
| 7.8958 | 21 | 4.7% |
| 26 | 20 | 4.5% |
| 7.75 | 17 | 3.8% |
| 10.5 | 16 | 3.6% |
| 26.55 | 10 | 2.2% |
| 7.2292 | 7 | 1.6% |
| 7.225 | 7 | 1.6% |
| 7.925 | 6 | 1.3% |
| Other values (162) | 293 |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.8583 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.225 | 7 | |
| 7.2292 | 7 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.8583 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.225 | 7 | |
| 7.2292 | 7 |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 85 | 81 |
| Distinct (%) | 85.0% | 83.5% |
| Missing | 346 | 349 |
| Missing (%) | 77.6% | 78.3% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.73 | 3.5154639 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 373 | 341 |
| Distinct characters | 19 | 19 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 74 | 67 ? |
| Unique (%) | 74.0% | 69.1% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | C47 | F4 |
| 2nd row | E34 | D49 |
| 3rd row | A14 | B77 |
| 4th row | F2 | B51 B53 B55 |
| 5th row | E50 | C106 |
| Value | Count | Frequency (%) |
| c23 | 4 | 3.4% |
| c27 | 4 | 3.4% |
| c25 | 4 | 3.4% |
| d | 3 | 2.5% |
| e101 | 3 | 2.5% |
| c52 | 2 | 1.7% |
| b98 | 2 | 1.7% |
| b96 | 2 | 1.7% |
| b28 | 2 | 1.7% |
| b20 | 2 | 1.7% |
| Other values (86) | 91 |
| Value | Count | Frequency (%) |
| c22 | 3 | 2.7% |
| g6 | 3 | 2.7% |
| c26 | 3 | 2.7% |
| f | 3 | 2.7% |
| f33 | 2 | 1.8% |
| b77 | 2 | 1.8% |
| b20 | 2 | 1.8% |
| g73 | 2 | 1.8% |
| e33 | 2 | 1.8% |
| b18 | 2 | 1.8% |
| Other values (82) | 89 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 44 | |
| 2 | 42 | |
| 1 | 37 | 9.9% |
| 5 | 26 | 7.0% |
| 3 | 25 | 6.7% |
| B | 25 | 6.7% |
| 6 | 22 | 5.9% |
| 19 | 5.1% | |
| 4 | 19 | 5.1% |
| 0 | 18 | 4.8% |
| Other values (9) | 96 |
| Value | Count | Frequency (%) |
| C | 33 | 9.7% |
| B | 31 | 9.1% |
| 1 | 30 | 8.8% |
| 6 | 28 | 8.2% |
| 2 | 28 | 8.2% |
| 3 | 28 | 8.2% |
| 7 | 21 | 6.2% |
| 5 | 19 | 5.6% |
| 8 | 19 | 5.6% |
| D | 18 | 5.3% |
| Other values (9) | 86 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 235 | |
| Uppercase Letter | 119 | |
| Space Separator | 19 | 5.1% |
| Value | Count | Frequency (%) |
| Decimal Number | 212 | |
| Uppercase Letter | 113 | |
| Space Separator | 16 | 4.7% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 44 | |
| B | 25 | |
| E | 17 | 14.3% |
| D | 15 | 12.6% |
| A | 8 | 6.7% |
| F | 6 | 5.0% |
| G | 3 | 2.5% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| C | 33 | |
| B | 31 | |
| D | 18 | |
| E | 12 | 10.6% |
| F | 8 | 7.1% |
| G | 6 | 5.3% |
| A | 4 | 3.5% |
| T | 1 | 0.9% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 42 | |
| 1 | 37 | |
| 5 | 26 | |
| 3 | 25 | |
| 6 | 22 | |
| 4 | 19 | |
| 0 | 18 | |
| 7 | 17 | |
| 8 | 16 | 6.8% |
| 9 | 13 | 5.5% |
| Value | Count | Frequency (%) |
| 1 | 30 | |
| 6 | 28 | |
| 2 | 28 | |
| 3 | 28 | |
| 7 | 21 | |
| 5 | 19 | |
| 8 | 19 | |
| 0 | 16 | |
| 4 | 12 | 5.7% |
| 9 | 11 | 5.2% |
Space Separator
| Value | Count | Frequency (%) |
| 19 |
| Value | Count | Frequency (%) |
| 16 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 254 | |
| Latin | 119 |
| Value | Count | Frequency (%) |
| Common | 228 | |
| Latin | 113 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 44 | |
| B | 25 | |
| E | 17 | 14.3% |
| D | 15 | 12.6% |
| A | 8 | 6.7% |
| F | 6 | 5.0% |
| G | 3 | 2.5% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| C | 33 | |
| B | 31 | |
| D | 18 | |
| E | 12 | 10.6% |
| F | 8 | 7.1% |
| G | 6 | 5.3% |
| A | 4 | 3.5% |
| T | 1 | 0.9% |
Common
| Value | Count | Frequency (%) |
| 2 | 42 | |
| 1 | 37 | |
| 5 | 26 | |
| 3 | 25 | |
| 6 | 22 | |
| 19 | ||
| 4 | 19 | |
| 0 | 18 | |
| 7 | 17 | |
| 8 | 16 | 6.3% |
| Value | Count | Frequency (%) |
| 1 | 30 | |
| 6 | 28 | |
| 2 | 28 | |
| 3 | 28 | |
| 7 | 21 | |
| 5 | 19 | |
| 8 | 19 | |
| 16 | ||
| 0 | 16 | |
| 4 | 12 | 5.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 373 |
| Value | Count | Frequency (%) |
| ASCII | 341 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 44 | |
| 2 | 42 | |
| 1 | 37 | 9.9% |
| 5 | 26 | 7.0% |
| 3 | 25 | 6.7% |
| B | 25 | 6.7% |
| 6 | 22 | 5.9% |
| 19 | 5.1% | |
| 4 | 19 | 5.1% |
| 0 | 18 | 4.8% |
| Other values (9) | 96 |
| Value | Count | Frequency (%) |
| C | 33 | 9.7% |
| B | 31 | 9.1% |
| 1 | 30 | 8.8% |
| 6 | 28 | 8.2% |
| 2 | 28 | 8.2% |
| 3 | 28 | 8.2% |
| 7 | 21 | 6.2% |
| 5 | 19 | 5.6% |
| 8 | 19 | 5.6% |
| D | 18 | 5.3% |
| Other values (9) | 86 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 2 | 0 |
| Missing (%) | 0.4% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 444 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | C | Q |
| 3rd row | S | S |
| 4th row | S | S |
| 5th row | C | S |
Common Values
| Value | Count | Frequency (%) |
| S | 329 | |
| C | 80 | 17.9% |
| Q | 35 | 7.8% |
| (Missing) | 2 | 0.4% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 80 | 17.9% |
| Q | 40 | 9.0% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 329 | |
| c | 80 | 18.0% |
| q | 35 | 7.9% |
| Value | Count | Frequency (%) |
| s | 326 | |
| c | 80 | 17.9% |
| q | 40 | 9.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 329 | |
| C | 80 | 18.0% |
| Q | 35 | 7.9% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 80 | 17.9% |
| Q | 40 | 9.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 444 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 446 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 329 | |
| C | 80 | 18.0% |
| Q | 35 | 7.9% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 80 | 17.9% |
| Q | 40 | 9.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 444 |
| Value | Count | Frequency (%) |
| Latin | 446 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 329 | |
| C | 80 | 18.0% |
| Q | 35 | 7.9% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 80 | 17.9% |
| Q | 40 | 9.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 444 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 329 | |
| C | 80 | 18.0% |
| Q | 35 | 7.9% |
| Value | Count | Frequency (%) |
| S | 326 | |
| C | 80 | 17.9% |
| Q | 40 | 9.0% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Embarked | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | -0.021 | -0.061 | 0.005 | -0.002 | 0.079 | 0.016 | 0.047 | 0.000 |
| Age | -0.021 | 1.000 | -0.169 | -0.238 | 0.107 | 0.178 | 0.268 | 0.157 | 0.000 |
| SibSp | -0.061 | -0.169 | 1.000 | 0.492 | 0.442 | 0.126 | 0.138 | 0.256 | 0.049 |
| Parch | 0.005 | -0.238 | 0.492 | 1.000 | 0.426 | 0.146 | 0.000 | 0.312 | 0.000 |
| Fare | -0.002 | 0.107 | 0.442 | 0.426 | 1.000 | 0.265 | 0.460 | 0.201 | 0.200 |
| Survived | 0.079 | 0.178 | 0.126 | 0.146 | 0.265 | 1.000 | 0.291 | 0.554 | 0.183 |
| Pclass | 0.016 | 0.268 | 0.138 | 0.000 | 0.460 | 0.291 | 1.000 | 0.102 | 0.289 |
| Sex | 0.047 | 0.157 | 0.256 | 0.312 | 0.201 | 0.554 | 0.102 | 1.000 | 0.092 |
| Embarked | 0.000 | 0.000 | 0.049 | 0.000 | 0.200 | 0.183 | 0.289 | 0.092 | 1.000 |
Dataset B
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Embarked | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.021 | -0.052 | 0.048 | 0.046 | 0.067 | 0.101 | 0.095 | 0.000 |
| Age | 0.021 | 1.000 | -0.132 | -0.215 | 0.054 | 0.155 | 0.216 | 0.021 | 0.000 |
| SibSp | -0.052 | -0.132 | 1.000 | 0.424 | 0.464 | 0.170 | 0.122 | 0.230 | 0.056 |
| Parch | 0.048 | -0.215 | 0.424 | 1.000 | 0.427 | 0.185 | 0.000 | 0.252 | 0.000 |
| Fare | 0.046 | 0.054 | 0.464 | 0.427 | 1.000 | 0.296 | 0.489 | 0.195 | 0.206 |
| Survived | 0.067 | 0.155 | 0.170 | 0.185 | 0.296 | 1.000 | 0.353 | 0.547 | 0.123 |
| Pclass | 0.101 | 0.216 | 0.122 | 0.000 | 0.489 | 0.353 | 1.000 | 0.092 | 0.256 |
| Sex | 0.095 | 0.021 | 0.230 | 0.252 | 0.195 | 0.547 | 0.092 | 1.000 | 0.113 |
| Embarked | 0.000 | 0.000 | 0.056 | 0.000 | 0.206 | 0.123 | 0.256 | 0.113 | 1.000 |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 227 | 228 | 0 | 3 | Lovell, Mr. John Hall ("Henry") | male | 20.5 | 0 | 0 | A/5 21173 | 7.2500 | NaN | S |
| 839 | 840 | 1 | 1 | Marechal, Mr. Pierre | male | NaN | 0 | 0 | 11774 | 29.7000 | C47 | C |
| 418 | 419 | 0 | 2 | Matthews, Mr. William John | male | 30.0 | 0 | 0 | 28228 | 13.0000 | NaN | S |
| 419 | 420 | 0 | 3 | Van Impe, Miss. Catharina | female | 10.0 | 0 | 2 | 345773 | 24.1500 | NaN | S |
| 319 | 320 | 1 | 1 | Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone) | female | 40.0 | 1 | 1 | 16966 | 134.5000 | E34 | C |
| 179 | 180 | 0 | 3 | Leonard, Mr. Lionel | male | 36.0 | 0 | 0 | LINE | 0.0000 | NaN | S |
| 105 | 106 | 0 | 3 | Mionoff, Mr. Stoytcho | male | 28.0 | 0 | 0 | 349207 | 7.8958 | NaN | S |
| 439 | 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.0 | 0 | 0 | C.A. 18723 | 10.5000 | NaN | S |
| 475 | 476 | 0 | 1 | Clifford, Mr. George Quincy | male | NaN | 0 | 0 | 110465 | 52.0000 | A14 | S |
| 221 | 222 | 0 | 2 | Bracken, Mr. James H | male | 27.0 | 0 | 0 | 220367 | 13.0000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 618 | 619 | 1 | 2 | Becker, Miss. Marion Louise | female | 4.0 | 2 | 1 | 230136 | 39.0000 | F4 | S |
| 718 | 719 | 0 | 3 | McEvoy, Mr. Michael | male | NaN | 0 | 0 | 36568 | 15.5000 | NaN | Q |
| 407 | 408 | 1 | 2 | Richards, Master. William Rowe | male | 3.0 | 1 | 1 | 29106 | 18.7500 | NaN | S |
| 589 | 590 | 0 | 3 | Murdlin, Mr. Joseph | male | NaN | 0 | 0 | A./5. 3235 | 8.0500 | NaN | S |
| 138 | 139 | 0 | 3 | Osen, Mr. Olaf Elon | male | 16.0 | 0 | 0 | 7534 | 9.2167 | NaN | S |
| 32 | 33 | 1 | 3 | Glynn, Miss. Mary Agatha | female | NaN | 0 | 0 | 335677 | 7.7500 | NaN | Q |
| 367 | 368 | 1 | 3 | Moussa, Mrs. (Mantoura Boulos) | female | NaN | 0 | 0 | 2626 | 7.2292 | NaN | C |
| 338 | 339 | 1 | 3 | Dahl, Mr. Karl Edwart | male | 45.0 | 0 | 0 | 7598 | 8.0500 | NaN | S |
| 563 | 564 | 0 | 3 | Simmons, Mr. John | male | NaN | 0 | 0 | SOTON/OQ 392082 | 8.0500 | NaN | S |
| 767 | 768 | 0 | 3 | Mangan, Miss. Mary | female | 30.5 | 0 | 0 | 364850 | 7.7500 | NaN | Q |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 509 | 510 | 1 | 3 | Lang, Mr. Fang | male | 26.0 | 0 | 0 | 1601 | 56.4958 | NaN | S |
| 613 | 614 | 0 | 3 | Horgan, Mr. John | male | NaN | 0 | 0 | 370377 | 7.7500 | NaN | Q |
| 590 | 591 | 0 | 3 | Rintamaki, Mr. Matti | male | 35.0 | 0 | 0 | STON/O 2. 3101273 | 7.1250 | NaN | S |
| 588 | 589 | 0 | 3 | Gilinski, Mr. Eliezer | male | 22.0 | 0 | 0 | 14973 | 8.0500 | NaN | S |
| 289 | 290 | 1 | 3 | Connolly, Miss. Kate | female | 22.0 | 0 | 0 | 370373 | 7.7500 | NaN | Q |
| 211 | 212 | 1 | 2 | Cameron, Miss. Clear Annie | female | 35.0 | 0 | 0 | F.C.C. 13528 | 21.0000 | NaN | S |
| 606 | 607 | 0 | 3 | Karaic, Mr. Milan | male | 30.0 | 0 | 0 | 349246 | 7.8958 | NaN | S |
| 526 | 527 | 1 | 2 | Ridsdale, Miss. Lucy | female | 50.0 | 0 | 0 | W./C. 14258 | 10.5000 | NaN | S |
| 830 | 831 | 1 | 3 | Yasbeck, Mrs. Antoni (Selini Alexander) | female | 15.0 | 1 | 0 | 2659 | 14.4542 | NaN | C |
| 465 | 466 | 0 | 3 | Goncalves, Mr. Manuel Estanslas | male | 38.0 | 0 | 0 | SOTON/O.Q. 3101306 | 7.0500 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 46 | 47 | 0 | 3 | Lennon, Mr. Denis | male | NaN | 1 | 0 | 370371 | 15.5000 | NaN | Q |
| 196 | 197 | 0 | 3 | Mernagh, Mr. Robert | male | NaN | 0 | 0 | 368703 | 7.7500 | NaN | Q |
| 662 | 663 | 0 | 1 | Colley, Mr. Edward Pomeroy | male | 47.00 | 0 | 0 | 5727 | 25.5875 | E58 | S |
| 10 | 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4.00 | 1 | 1 | PP 9549 | 16.7000 | G6 | S |
| 683 | 684 | 0 | 3 | Goodwin, Mr. Charles Edward | male | 14.00 | 5 | 2 | CA 2144 | 46.9000 | NaN | S |
| 342 | 343 | 0 | 2 | Collander, Mr. Erik Gustaf | male | 28.00 | 0 | 0 | 248740 | 13.0000 | NaN | S |
| 846 | 847 | 0 | 3 | Sage, Mr. Douglas Bullen | male | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 279 | 280 | 1 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.00 | 1 | 1 | C.A. 2673 | 20.2500 | NaN | S |
| 803 | 804 | 1 | 3 | Thomas, Master. Assad Alexander | male | 0.42 | 0 | 1 | 2625 | 8.5167 | NaN | C |
| 850 | 851 | 0 | 3 | Andersson, Master. Sigvard Harald Elias | male | 4.00 | 4 | 2 | 347082 | 31.2750 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||